Multi-class Text Categorization with Error Correcting Codes
نویسنده
چکیده
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classi cation of Internet pages for a search engine database. The traditional 1-of-n output coding for classi cation scheme needs resources increasing linearly with the number of classes. A di erent solution uses an error correcting code, increasing in length with O(log2(n)) only. In this paper we investigate the potential of error correcting codes for text categorization with many categories. The main result is that multi-class codes have advantages for classes which comprise only a small fraction of the data.
منابع مشابه
Multi-class Classification with Error Correcting Codes
Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classification of Internet pages for a search engine database. The traditional 1-of-n output coding for classification scheme needs resources increasing linearly with the number of classes. A different solution uses an error correcting code, increasing in length with O(log2(n)) only. I...
متن کاملError-Correcting Output Codes for Multi-Label Text Categorization
When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output...
متن کاملRanking Error-Correcting Output Codes for Class Retrieval
Error-Correcting Output Codes (ECOC) is a general framework for combining binary classification in order to address the multi-class categorization problem. In this paper, we include contextual and semantic information in the decoding process of the ECOC framework, defining an ECOC-rank methodology. Altering the ECOC output values by means of the adjacency of classes based on features and class ...
متن کاملLoss-Weighted Decoding for Error-Correcting Output Coding
The multi-class classification is a challenging problem for several applications in Computer Vision. Error Correcting Output Codes technique (ECOC) represents a general framework capable to extend any binary classification process to the multi-class case. In this work, we present a novel decoding strategy that takes advantage of the ECOC coding to outperform the up to now existing decoding stra...
متن کاملN-ary Error Correcting Coding Scheme
The coding matrix design plays a fundamental role in the prediction performance of the error correcting output codes (ECOC)-based multi-class task. In many-class classification problems, e.g., fine-grained categorization, it is difficult to distinguish subtle between-class differences under existing coding schemes due to a limited choices of coding values. In this paper, we investigate whether ...
متن کامل